Development of a Korean OCR System Term Project in CSE 581 - Pattern Recognition
نویسنده
چکیده
This is the final report for the term project in CSE 581 (Pattern Recognition). The goal of this project is to develop a character recognition system that is able to recognize and classify a subset of the Korean language. Training and test samples where obtained from Korean books, additional sets created by applying a degradation model to the obtained samples. Using geometrical, statistical and global features the system has a recognition rate of 90%. The classification was based on nearest neighborclassification.
منابع مشابه
Script Identification – A Han & Roman Script Perspective
All Han-based scripts (Chinese, Japanese, and Korean) possess similar visual characteristics. Hence system development for identification of Chinese, Japanese and Korean scripts from a single document page is quite challenging. It is noted that a Han-based document page might also have Roman script in them. A multi-script OCR system dealing with Chinese, Japanese, Korean, and Roman scripts, dem...
متن کاملA sustainable development OCR system in CADAL application
This paper briefly introduces the main ideas of a sustainable development OCR system based on open architecture techniques and then describes the construction of an optical character recognition (OCR) center built on computer clusters, for the purpose of dynamically improving the recognition precision of the digitized texts of a million volumes of books produced by the China-US Million Books Di...
متن کاملOptical Character Recognition for Handwritten Cursive English characters
Optical Character Recognition (OCR) is the technique which enables a machine to automatically recognize the characters or scripts written in the users’ language. Optical Character Recognition (OCR) has become one of the most successful applications of technology in the field of pattern recognition and artificial intelligence. In this project a scanned image is translated into machine editable t...
متن کاملA Literature Survey on Digital Image Processing Techniques in Character Recognition of Indian Languages
Handwritten character recognition is always a frontier area of research in the field of pattern recognition. There is a large demand for OCR on hand written documents in Image processing. Even though, sufficient studies have performed in foreign scripts like Arabic, Chinese and Japanese, only a very few work can be traced for handwritten character recognition mainly for the south Indian scripts...
متن کاملA Real-time DSP-Based Optical Character Recognition System for Isolated Arabic characters using the TI TMS320C6416T
Optical Character Recognition (OCR) is an area of research that has attracted the interest of researchers for the past forty years. Although the subject has been the center topic for many researchers for years, it remains one of the most challenging and exciting areas in pattern recognition. Since Arabic is one of the most widely used languages in the world, the demand for a robust OCR for this...
متن کامل